What is the most competitive sport? 
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We present an extensive statistical analysis of the results of all sports competitions in five major 
sports leagues in England and the United States. We characterize the parity among teams by the 
variance in the winning fraction from season-end standings data and quantify the predictability of 
games by the frequency of upsets from game results data. We introduce a mathematical model in 
which the underdog team wins with a fixed upset probability. This model quantitatively relates the 
parity among teams with the predictability of the games, and it can be used to estimate the upset 
frequency from standings data. We propose the likelihood of upsets as a measure of competitiveness. 



What is the most competitive team sport? We answer 
this question via a statistical survey of game results Q, 
0, 0, Q ■ We relate parity with predictability, and propose 
the likelihood of upsets as a measure of competitiveness. 

We studied the results of all regular season competi- 
tions in 5 major professional sports leagues in England 
and the United States (table I) : the premier soccer league 
of the English Football Association (FA), Major League 
Baseball (MLB), the National Hockey League (NHL), 
the National Basketball Association (NBA), and the Na- 
tional Football League (NFL). NFL data includes the 
short-lived AFL. We considered only complete seasons, 
with more than 300,000 games in over a century 

The winning fraction, the ratio of wins to total games, 
quantifies team strength. Thus the distribution of win- 
ning fraction measures the parity between teams in a 
league. We computed F(x), the fraction of teams with a 
winning fraction of x or lower at the end of the season, as 
well as a = (x 2 ) — (x) 2 , the standard deviation in win- 
ning fraction. Here (•) denotes the average over all teams 
and all years using season-end standings. For example, 
in baseball where the winning fraction x typically falls 
between 0.400 and 0.600, the variance is a = 0.084. As 
shown in figures 1 and 2a, the winning fraction distribu- 
tion clearly distinguishes the five leagues. It is narrowest 
for baseball and widest for football. 
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FA 


1888-2005 


43350 


39.7 


0.102 


0.452 


0.459 


MLB 


1901-2005 


163720 


155.5 


0.084 


0.441 


0.413 


NHL 


1917-2004 


39563 


70.8 


0.120 


0.414 


0.383 


NBA 


1946-2005 


43254 


79.1 


0.150 


0.365 


0.316 


NFL 


1922-2004 


11770 


14.0 


0.210 


0.364 


0.309 



TABLE I: Summary of the sports statistics data. Listed are 
the time periods, total number of games, average number of 
games played by a team in a season ((games)), variance in 
the win-percentage distribution (<r), measured frequency of 
upsets (q), and upset probability obtained using the theoreti- 
cal model ((/model). The fraction of ties in soccer, hockey, and 
football is 0.246, 0.144, and 0.016, respectively. 



FIG. 1: Winning fraction distribution (curves) and the best- 
fit distributions from simulations of our model (circles). For 
clarity, FA, that lies between MLB and NHL, is not displayed. 



Do these results imply that MLB games are the most 
competitive and NFL games the least? Not necessarily! 
The length of the season is a significant factor in the 
variability in the winning fraction. In a scenario where 
the outcome of a game is completely random, the to- 
tal number of wins performs a simple random walk, and 
the standard deviation a is inversely proportional to the 
square root of the number of games played. Generally, 
the shorter the season, the larger a. Thus, the small 
number of games is partially responsible for the large 
variability observed in the NFL. 

To account for the varying season length and reveal the 
true nature of the sport, we set up mock sports leagues 
where teams, paired at random, play a fixed number of 
games. In this simulation model, the team with the bet- 
ter record is considered as the favorite and the team with 
the worse record is considered as the underdog. The out- 
come of a game depends on the relative team strengths: 
with the "upset probability" q < 1/2, the underdog wins, 
but otherwise, the favorite wins. Our analysis of the non- 
linear master equations that describe the evolution of the 
distribution of team win/loss records shows that a de- 
creases both as the season length increases and as games 
become more competitive, i.e., as q increases ja|. In a 
hypothetical season with an infinite number of games, 
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FIG. 2: (a) The cumulative variance in the winning fraction 
distribution (for all seasons up to a given year) versus time, 
(b) The cumulative frequency of upsets q, measured directly 
from game results, versus time. 

the winning fraction distribution is uniform in the range 
q < x < 1 — q and as a result, a = (1/2 — <?)/\/3. 

We run Monte Carlo simulations of these artificial 
sports leagues, with sport-specific number of games and 
a range of q values. We then determine the value of q 
that gives the best match between the distribution F(x) 
from the simulations to the actual sports statistics (fig- 
ure 1). Generally, we find good agreement between the 
simulations results and the data for reasonable q values. 

To characterize the predictability of games, we followed 
the chronologically-ordered results of all games and re- 
constructed the league standings at any given day. We 
then measured the upset frequency q by counting the 
fraction of times that the team with the worse record 
on the game date actually won (table I). Games between 
teams with no record (start of a season) or teams with 
equal records were disregarded. Game location was ig- 
nored and so was the margin of victory. In soccer, hockey, 
and football, ties were counted as 1/2 of a victory for 
both teams. We verified that handling ties this way did 
not significantly affect the results: the upset probability 
changes by at most 0.02 (and typically, much less) if ties 
are ignored. 

We find that soccer and baseball are the most com- 
petitive sports with q = 0.452 and q = 0.441, respec- 
tively, while basketball and football, with nearly identical 
q = 0.365 and q = 0.364, are the least. There is also good 



agreement between the upset probability q mo deh obtained 
by fitting the winning fraction distribution from numer- 
ical simulations of our model to the data as in figure 1, 
and the measured upset frequency (table I). Consistent 
with our theory, the variance a mirrors the bias, 1/2 — q 
(figures 2a and 2b). Tracking the evolution of either q 
or a leads to the same conclusion: NFL and MLB games 
are becoming more competitive, while over the past 60 
years, FA displays an opposite trend. 

In summary, we propose a single quantity, q, the 
frequency of upsets, as an index for quantifying the 
predictability, and hence the competitiveness of sports 
games. We demonstrated the utility of this measure via a 
comparative analysis that shows that soccer and baseball 
are the most competitive sports. Trends in this measure 
may reflect the gradual evolution of the teams in response 
to competitive pressure 0, as well as changes in game 
strategy or rules || . 

Our modeL in which the stronger team is favored to 
win a game |6J , enables us to take into account the vary- 
ing season length and this model directly relates par- 
ity, as measured by the variance a with predictability, as 
measured by the upset likelihood q. This connection has 
practical utility as it allows one to conveniently estimate 
the likelihood of upsets from the more easily-accessible 
standings data. In our theory, all teams are equal at the 
start of the season, but by chance, some end up strong 
and some weak. Our idealized model does not include the 
notion of innate team strength; nevertheless, the spon- 
taneous emergence of disparate-strength teams provides 
the crucial mechanism needed for quantitative modeling 
of the complex dynamics of sports competitions. 
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